Space-Efficient Estimation of Robust Statistics and Distribution Testing
نویسندگان
چکیده
The generic problem of estimation and inference given a sequence of i.i.d. samples has been extensively studied in the statistics, property testing, and learning communities. A natural quantity of interest is the sample complexity of the particular learning or estimation problem being considered. While sample complexity is an important component of the computational efficiency of the task, it is also natural to consider the space complexity: do we need to store all the samples as they are drawn, or is it sufficient to use memory that is significantly sublinear in the sample complexity? Surprisingly, this aspect of the complexity of estimation has received significantly less attention in all but a few specific cases. While space-bounded, sequential computation is the purview of the field of data-stream computation, almost all of the literature on the algorithmic theory of data-streams considers only “empirical problems”, where the goal is to compute a function of the data present in the stream rather than to infer something about the source of the stream. Our contributions are two-fold. First, we provide results connecting space efficiency to the estimation of robust statistics from a sequence of i.i.d. samples. Robust statistics are a particularly interesting class of statistics in our setting because, by definition, they are resilient to noise or errors in the sampled data. We show that this property is enough to ensure that very space-efficient stream algorithms exist for their estimation. In contrast, the numerical value of a “non-robust" statistic can change dramatically with additional samples, and this limits the utility of any finite length sequence of samples. Second, we present a general result that captures a trade-off between sample and space complexity in the context of distributional property testing.
منابع مشابه
Testing the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging
Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...
متن کاملRobust tests for testing the parameters of a normal population
This article aims to provide a simple robust method to test the parameters of a normal population by using the new diagnostic tool called the “Forward Search” (FS) method. The most commonly used procedures to test the mean and variance of a normal distribution are Student’s t test and Chi-square test, respectively. These tests suffer from the presence of outliers. We introduce the FS version of...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملOn Performance of Reconstructed Middle Order Statistics in Exponential Distribution
In a number of life-testing experiments, there exist situations where the monitoring breaks down for a temporary period of time. In such cases, some parts of the ordered observations, for example the middle ones, are censored and the only outcomes available for analysis consist of the lower and upper order statistics. Therefore, the experimenter may not gain the complete information on fa...
متن کاملBayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function
In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010